<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0032)http://www.id3.org/mp3frame.html -->
<HTML><HEAD><TITLE>The private life of MP3 frames</TITLE>
    <META http-equiv=Content-Type content="text/html; charset=windows-1252">
    <STYLE type=text/css>TD.h1 {
        FONT: 43px Arial, Helvetica
    }

    TD.h2 {
        FONT: 27px Arial, Helvetica
    }

    DIV.h5 {
        FONT: 14px Arial, Helvetica
    }

    TD {
        FONT: 14px Arial, Helvetica
    }

    P.t {
        FONT: 14px Arial, Helvetica
    }

    A {
        COLOR: #dd6600;
        TEXT-DECORATION: none
    }

    B {
        FONT-WEIGHT: bold
    }
    </STYLE>

    <META content="The private life of MP3 frames." name=description>
    <META content="MP3, technical" name=keywords>
    <META content="Microsoft FrontPage 4.0" name=GENERATOR></HEAD>

<BODY text=black bgColor=white>
<CENTER>
<TABLE border=0>
    <TBODY>
        <TR>
            <TD class=h1>&nbsp;The private life of MP3 frames&nbsp;</TD></TR>
        <TR>
            <TD bgColor=#ff7700><IMG height=1 alt=""
                                     src="The private life of MP3 frames_files/fillpx.gif"
                                     width=1></TD></TR></TBODY></TABLE>
<BR>&nbsp;
<TABLE border=0>
    <TBODY>
        <TR>
            <TD class=h2>How is MP3 built?</TD></TR></TBODY></TABLE>
<TABLE border=0>
<TBODY>
<TR vAlign=top>
<TD>
<P>Most people with a little knowledge in MP3 files know that the sound is
    divided into smaller parts and compressed with a psycoacoustic model. This
    smaller pieces of the audio is then put into something called 'frames',
    which is a little datablock with a header. I'll focus on that header in
    this text.</P>

<P>The header is 4 bytes, 32 bits, big and begins with something called
    sync. This sync is, at least according to the MPEG standard, 12 set bits
    in a row. Some add-on standards made later uses 11 set bits and one
    cleared bit. The sync is directly followed by a ID bit, indicating if the
    file is a MPEG-1 och MPEG-2 file. 0=MPEG-2 and 1=MPEG-1</P>

<P>The layer is defined with the two layers bits. They are oddly defined
    as</P>
<CENTER>
    <TABLE cellSpacing=0 cellPadding=2 border=1>
        <TBODY>
            <TR>
                <TD>0 0</TD>
                <TD>Not defined</TD></TR>
            <TR>
                <TD>0 1</TD>
                <TD>Layer III</TD></TR>
            <TR>
                <TD>1 0</TD>
                <TD>Layer II</TD></TR>
            <TR>
                <TD>1 1</TD>
                <TD>Layer I</TD></TR></TBODY></TABLE>
</CENTER>

<P>With this information and the information in the bitrate field we can
    determine the bitrate of the audio (in kbit/s) according to this
    table.</P>
<CENTER>
<TABLE cellSpacing=0 cellPadding=2 border=1>
<TBODY>
<TR>
    <TD>Bitrate<BR>value</TD>
    <TD>MPEG-1,<BR>layer I</TD>
    <TD>MPEG-1,<BR>layer II</TD>
    <TD>MPEG-1,<BR>layer III</TD>
    <TD>MPEG-2,<BR>layer I</TD>
    <TD>MPEG-2,<BR>layer II</TD>
    <TD>MPEG-2,<BR>layer III</TD></TR>
<TR>
    <TD>0 0 0 0</TD>
    <TD></TD>
    <TD></TD>
    <TD></TD>
    <TD></TD>
    <TD></TD>
    <TD></TD></TR>
<TR>
    <TD>0 0 0 1</TD>
    <TD>32</TD>
    <TD>32</TD>
    <TD>32</TD>
    <TD>32</TD>
    <TD>32</TD>
    <TD>8</TD></TR>
<TR>
    <TD>0 0 1 0</TD>
    <TD>64</TD>
    <TD>48</TD>
    <TD>40</TD>
    <TD>64</TD>
    <TD>48</TD>
    <TD>16</TD></TR>
<TR>
    <TD>0 0 1 1</TD>
    <TD>96</TD>
    <TD>56</TD>
    <TD>48</TD>
    <TD>96</TD>
    <TD>56</TD>
    <TD>24</TD></TR>
<TR>
    <TD>0 1 0 0</TD>
    <TD>128</TD>
    <TD>64</TD>
    <TD>56</TD>
    <TD>128</TD>
    <TD>64</TD>
    <TD>32</TD></TR>
<TR>
    <TD>0 1 0 1</TD>
    <TD>160</TD>
    <TD>80</TD>
    <TD>64</TD>
    <TD>160</TD>
    <TD>80</TD>
    <TD>64</TD></TR>
<TR>
    <TD>0 1 1 0</TD>
    <TD>192</TD>
    <TD>96</TD>
    <TD>80</TD>
    <TD>192</TD>
    <TD>96</TD>
    <TD>80</TD></TR>
<TR>
    <TD>0 1 1 1</TD>
    <TD>224</TD>
    <TD>112</TD>
    <TD>96</TD>
    <TD>224</TD>
    <TD>112</TD>
    <TD>56</TD></TR>
<TR>
    <TD>1 0 0 0</TD>
    <TD>256</TD>
    <TD>128</TD>
    <TD>112</TD>
    <TD>256</TD>
    <TD>128</TD>
    <TD>64</TD></TR>
<TR>
    <TD>1 0 0 1</TD>
    <TD>288</TD>
    <TD>160</TD>
    <TD>128</TD>
    <TD>288</TD>
    <TD>160</TD>
    <TD>128</TD></TR>
<TR>
    <TD>1 0 1 0</TD>
    <TD>320</TD>
    <TD>192</TD>
    <TD>160</TD>
    <TD>320</TD>
    <TD>192</TD>
    <TD>160</TD></TR>
<TR>
    <TD>1 0 1 1</TD>
    <TD>352</TD>
    <TD>224</TD>
    <TD>192</TD>
    <TD>352</TD>
    <TD>224</TD>
    <TD>112</TD></TR>
<TR>
    <TD>1 1 0 0</TD>
    <TD>384</TD>
    <TD>256</TD>
    <TD>224</TD>
    <TD>384</TD>
    <TD>256</TD>
    <TD>128</TD></TR>
<TR>
    <TD>1 1 0 1</TD>
    <TD>416</TD>
    <TD>320</TD>
    <TD>256</TD>
    <TD>416</TD>
    <TD>320</TD>
    <TD>256</TD></TR>
<TR>
    <TD>1 1 1 0</TD>
    <TD>448</TD>
    <TD>384</TD>
    <TD>320</TD>
    <TD>448</TD>
    <TD>384</TD>
    <TD>320</TD></TR>
<TR>
    <TD>1 1 1 1</TD>
    <TD></TD>
    <TD></TD>
    <TD></TD>
    <TD></TD>
    <TD></TD>
    <TD></TD></TR></TBODY></TABLE>
</CENTER>

<P>The sample rate is described in the frequency field. These values is
    dependent of which MPEG standard is used according to the following
    table.</P>
<CENTER>
    <TABLE cellSpacing=0 cellPadding=2 border=1>
        <TBODY>
            <TR>
                <TD>Frequency<BR>value</TD>
                <TD>MPEG-1</TD>
                <TD>MPEG-2</TD></TR>
            <TR>
                <TD>0 0</TD>
                <TD>44100 Hz</TD>
                <TD>22050 Hz</TD></TR>
            <TR>
                <TD>0 1</TD>
                <TD>48000 Hz</TD>
                <TD>24000 Hz</TD></TR>
            <TR>
                <TD>1 0</TD>
                <TD>32000 Hz</TD>
                <TD>16000 Hz</TD></TR>
            <TR>
                <TD>1 1</TD>
                <TD></TD>
                <TD></TD></TR></TBODY></TABLE>
</CENTER>

<P>Three bits is not needed in the decoding process at all. These are the
    copyright bit, original home bit and the private bit. The copyright has
    the same meaning as the copyright bit on CDs and DAT tapes, i.e. telling
    that it is illegal to copy the contents if the bit is set. The original
    home bit indicates, if set, that the frame is located on its original
    media. No one seems to know what the privat bit is good for.

<P>

<P>If the protection bit is NOT set then the frame header is followed by a
    16 bit checksum, inserted before the audio data. If the padding bit is set
    then the frame is padded with an extra byte. Knowing this the size of the
    complete frame can be calculated with the following formula</P>
<CENTER>
    <P>FrameSize = 144 * BitRate / SampleRate<BR>when the padding bit is
        cleared and</P>

    <P>FrameSize = (144 * BitRate / SampleRate) + 1<BR>when the padding bit is
        set.

    <P></CENTER>

<P>The frameSize is of course an integer. If for an example
    BitRate=128000, SampleRate=44100 and the padding bit is cleared, then the
    FrameSize = 144 * 128000 / 44100 = 417

<P>

<P>The mode field is used to tell which sort of stereo/mono encoding that
    has been used. The purpose of the mode extension field is different for
    different layers, but I really don't know exactly what it's for.</P>
<CENTER>
    <TABLE cellSpacing=0 cellPadding=2 border=1>
        <TBODY>
            <TR>
                <TD>Mode value</TD>
                <TD>mode</TD></TR>
            <TR>
                <TD>0 0</TD>
                <TD>Stereo</TD></TR>
            <TR>
                <TD>0 1</TD>
                <TD>Joint stereo</TD></TR>
            <TR>
                <TD>1 0</TD>
                <TD>Dual channel</TD></TR>
            <TR>
                <TD>1 1</TD>
                <TD>Mono</TD></TR></TBODY></TABLE>
</CENTER>

<P>The last field is the emphasis field. It is used to sort of
    're-equalize' the sound after a Dolby-like noise supression. This is not
    very used and will probably never be. The following noise supression model
    is used</P>
<CENTER>
    <TABLE cellSpacing=0 cellPadding=2 border=1>
        <TBODY>
            <TR>
                <TD>Emphasis value</TD>
                <TD>Emphasis method</TD></TR>
            <TR>
                <TD>0 0</TD>
                <TD>none</TD></TR>
            <TR>
                <TD>0 1</TD>
                <TD>50/15ms</TD></TR>
            <TR>
                <TD>1 0</TD>
                <TD></TD></TR>
            <TR>
                <TD>1 1</TD>
                <TD>CCITT j.17</TD></TR></TBODY></TABLE>
</CENTER></TD>
<TD align=middle width=230><IMG height=426 alt="Frame header."
                                src="The private life of MP3 frames_files/mp3frame_blocks.gif" width=135>
    <BR><I>Frame header.</I></TD></TR></TBODY></TABLE>
<BR>&nbsp;
</CENTER></BODY></HTML>
